Efficient Learning for Undirected Topic Models
نویسندگان
چکیده
Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended for documents of variant lengths and weighted inputs. Experiments on two benchmarks show that the new estimator achieves great learning efficiency and high accuracy on document retrieval and classification.
منابع مشابه
MedLDA: maximum margin supervised topic models
A supervised topic model can use side information such as ratings or labels associated with documents or images to discover more predictive low dimensional topical representations of the data. However, existing supervised topic models predominantly employ likelihood-driven objective functions for learning and inference, leaving the popular and potentially powerful max-margin principle unexploit...
متن کاملMining Associated Text and Images with Dual-Wing Harmoniums
We propose a multi-wing harmonium model for mining multimedia data that extends and improves on earlier models based on two-layer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the two-layer directed models such as LDA for similar tasks, but bears significant difference in inferen...
متن کاملMMH: Maximum Margin Supervised Harmoniums
Exponential family Harmoniums (EFH) are undirected topic models that enjoy nice properties such as fast inference compared to directed topic models. Supervised EFHs can utilize documents’ side information for discovering predictive latent topic representations. However, existing likelihood based estimation does not yield conclusive results. This paper presents a max-margin approach to learning ...
متن کاملReplicated Softmax: an Undirected Topic Model
We introduce a two-layer undirected graphical model, called a “Replicated Softmax”, that can be used to model and automatically extract low-dimensional latent semantic representations from a large unstructured collection of documents. We present efficient learning and inference algorithms for this model, and show how a Monte-Carlo based method, Annealed Importance Sampling, can be used to produ...
متن کاملUndirected and Interpretable Continuous Topic Models of Documents
We propose a new type of undirected graphical model suitable for topic modeling and dimensionality reduction for large text collections. Unlike previous Boltzmann machine and harmonium based methods, this new model represents words using Discrete distributions akin to traditional ‘bag-of-words’ methods. However, in contrast to directed topic models such as latent Dirichlet allocation, each word...
متن کامل